Simultaneous L 2 - and L°°- Adaptation in Nonparametric 

Regression 



Johannes Schmidt-Hieber* 
March 14, 2013 

Abstract 

Consider the nonparametric regression framework. It is a classical result that the 
minimax rates for L 2 - and L°°-risk over a Holder ball with smoothness index /3 are 
n -0/(20+l) an( j (nj \og n)~ PI ( 2 P +1 \ respectively. By using a specific thresholding pro- 
cedure, we construct an estimator that simultaneously achieves the optimal rates in L 2 
and L°° without prior knowledge of /?, i.e. it is simultaneously adaptive. 
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1 Introduction 

Assume that we are in the Gaussian white noise model, that is, we observe 

dY t = f(t)dt + n~ 1/2 dW t , t e [0, 1], (1.1) 

with / the unknown regression function and (Wt)t>o a Brownian motion. Nonparametric 
estimation of / has been studied under various aspects so far, but most of the work deals 
with loss functions either the L 2 - or the L°°-loss (cf. for example Tsybakov |6]). The L 2 - 
loss is somehow intrinsic for the white noise model whereas the L°°-loss allows for uniform 
control over f — f which is desirable for many applications and has an evident visual 
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interpretation. In this paper we construct an estimator that achieves the minimax adaptive 
rate simultaneously for both L 2 - and L°°-loss. 

We consider estimation (and hence function spaces) on [0, 1]. Let 9(/3,Q) denote the ball 
of Holder continuous functions with index f3 > and radius Q, that is, the class of all 
functions / such that exists with [0\ := maxjti £ N|u < /?} and Holder norm 

sup \f(x)\+ sup : — m , (1.2) 

a;e[0,l] x,ye[0,l], x^y \ x ~ U\ LPI 

bounded by Q. It is known that there are estimators, f 2 and say, such that for all 

< p < oo, 

sup E f \\f 2 -f\\l<n-^^ (1.3) 
/6G(/3,Q) 

and 

sup E^I/^-ZIU^^)"^ 2 ^. (1.4) 

Here, || • ||2 and || • ||oo denote the L 2 -norm and L°°-norm (uniform norm) on [0,1], re- 
spectively. Further, these rates are minimax, even if f3 is known, and thus f 2 and /oo are 
adaptive with respect to || • H2 and || • ||oo. 

One would like to embed this problem within the following, more general setting. Given a 
sequence of nonparametric experiments (Q n , A n , (Pe, n '■ 9 G Q)), suppose that we know how 
to construct estimators 9\ and #2 converging to the true parameter 9° £ with respect 
to two risks, Ri and R 2 say, such that Ri(9i,9°) < e n> i and ^2(^2,0°) < £n,2- Does there 
exist an estimator 9 such that e~\R\(9, 9°) + e~\R2(9, 9°) < 1? We believe that there is no 
general answer to this problem. In particular, a naive approach based on averaging of 9\ 
and 02 (as for instance 9 = \{9\ + #2)) will not work. 

By using a wavelet decomposition together with term-by-term hard thresholding, an L°°- 
adaptive estimator can be constructed, whereas a suitable block thresholding procedure will 
lead to L 2 -adaptation (for an overview on wavelet estimation, cf. Antoniadis [T] ). Refining 
the block thresholding, Cai |2j found an estimator that is simultaneously adaptive with 
respect to I?- and squared pointwise risk. A similar procedure has been studied in Cai and 
Silverman [3]. However, these estimators do not achieve the optimal rate in L°°-risk (cf. 
Theorem [3] below and the remarks thereafter) . This motivates to study whether there is 



an estimator / that adapts simultaneously to L and L°° in the sense that both (1.3) and 



(1.4) hold for /. 



Heuristically, one might want to argue as follows. Suppose that the smoothness index 
P is known. Then, in order to obtain a rate optimal estimator in L 2 -norm and uniform 
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norm the bandwidth of a kernel estimator has to be chosen of order h n 2 = n -1 /( 2 ^ +1 ) 
and h n ,oo = fI°£piW( 2 ^ +1 ) j respectively. Using the bandwidth h nt0C for L 2 -risk or vice 
versa, h n 2 for L°°-risk will result in additional log n factors in the rate. Since for a kernel 



estimator we have to fix one bandwidth h n , this suggests that, even if j3 is known, (1.3) 



and (1.4) cannot hold without additional log- factors in the rate. However, as we show, this 



heuristic reasoning is wrong and simultaneous adaptation indeed possible. 

The construction of our estimator is based on wavelet thresholding. In order to achieve 
simultaneous adaptation, a very specific thresholding procedure is required that we call 
truncated block thresholding. It consists of two steps. First, by using block thresholding, all 
resolution levels that contain a block on which L 2 -signal is detected are taken into account 
for the reconstruction. In a second step, for every resolution level on which L 2 -signal 
but no L°°-signal was found, we assign a random variable tj and truncate all estimated 
wavelet coefficients on this level with absolute value larger than tj. In this step, peaks in 
the reconstruction are smoothed out. 

Let us stress that the proposed truncation is of different nature than usual thresholding. 
In Section [2j we motivate the truncation step by projection on the true parameter space 
Q(f3,Q). By construction, the truncation does not affect the L 2 -rate but reduces the order 
of the variance for the L°°-risk, which would lead to a suboptimal rate otherwise. To 
summarize, our results imply that one can improve and robustify a wavelet estimator by 
additionally truncating certain large empirical wavelet coefficients. 

The paper is organized as follows. In Section [2j we derive an estimator that achieves 
the minimax rate of convergence with respect to both L 2 - and L°°-risk for known Holder 
smoothness f3. Extending this idea, we construct in Section [3] the truncated block thresh- 
olding estimator that is simultaneously adaptive for L 2 - and L°°-risk. A small numerical 
study is carried out in Section |4| Proofs can be found in Section [5| 



2 Simultaneous estimation for known smoothness 

Assume that / G @({3,Q) with f3 and Q known. Consider a pair of scaling and wavelet 
function (4>,ip) generated by a multiresolution analysis of L 2 [0, 1] that is s-regular, i.e. <f> 
and ip have compact support and are s-times continuously differentiable (for construction 
of wavelet bases on [0, 1], see also Cohen et al. Then, for every / G L 2 [0, 1], 

00 00 
k j=o keij j=— 1 kelj 
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with ip-i t k '■= 4>k '■= </>(• — k) and '■= Ck ■= J f {u)4>k{u)du as well as for j > 0, 

i[)jk := 2^ 2 ^)(^? ■ —k) and dj & := J f(u)^j } k{u)du. For any resolution level j, the expansion 
contains of the order of 2 ] coefficients cL-^, i.e. \Ij\ < 2 3 . Given the Gaussian white noise 
model (1.1), we observe 

Yj tk := J ifjj, k (u)dY u = d jt k + ~^ e j,fc' for j > —1, k G Ij, 



where (tj t k)j,k is an array of i.i.d. standard normal random variables. 

For a fixed wavelet, / E B(/3, Q) and (3 < s imply that there is a constant c = c(/3, Q) such 
that \djk\ < c2 _ 2( 2 ^ +1 ) for all j > —l,k (cf. for example Cai [2J, Lemma 1 (ii)). Choose 
integer sequences (J„) n := (J n (P))n an d : = {Jn{P))n such that 2 Jn X n 1 /( 2 l 3 + 1 ) anc l 

x |-io£n-j i/(2/3+i)^ together with the bound on the wavelet coefficients 

oo oo 

II E d ^\\ 2 £ ^ m+l) and || £ < (^)-MW 

J « +1 Jn + l 

Thus, as far as rates of convergence are concerned, there is no relevant L 2 - and L°°-signal 
on resolution levels above J n and J n , respectively. Consequently, the wavelet decomposition 
of / has three different regimes (I) — (III), namely 

J n Jn CO 

/= E + ^2 d i^j,k + E J2 d J^j,k- (2-1) 

J=-l fc 7=X+1 fc j = J n + l k 

* ' S y , * w ' 

(I): L 2 - and L°°-signal (II): L 2 - but no L°°-signal (HI)'- neither L 2 - nor L°°-signal 

For resolution levels j < J n , a reasonable choice is to take Yj^ as an estimator of dj^. 
However, it might happen that the function Yj^i/jj,k itself is not in Q(/3,Q). Then, instead 
of estimating dj^ by Yj^, we can improve by projecting Yj^ on [— c2~i^ 2 ^ +1 \ c2~i^ 2 ^ +1 ^] 
(recall that for the moment (3 and Q are known). Consequently, the estimator for the 
scaling/wavelet coefficients dj j. is given by 

d jtk := sign(y itfc )(|^- fc | A c2-5(2/ 3 + 1 )) (2.2) 

and the estimator for / is 

Jn 

/ : =EE^> (2-3) 
j=-i k 



Theorem 1. Given model (1.1) and an s-regular wavelet. The estimator f defined in (2.3) 
attains simultaneously the minimax rate in I? and L°° provided that f3 < s. 
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Notice that for Theorem [T] it is not essential that the best possible constant c for the 
construction of / is used. Since the maximum over n i.i.d. standard normal distributed 
random variables behaves like 0{\J\ogn) a.s., the projection in (2.2) only affects resolution 
levels j for which 2~a( 2 ' 3 + 1 ) < (12£™:)1/ 2 . Using (2.1) this coincides with the regime (77). 



3 Truncated block thresholding 

By imitating the ideas from the previous section, we derive an estimator that adapts simul- 
taneously under 1?- and L°°-risk. The crucial point in the construction is the projection 
on the interval [— c2~2^ 2 ^ +1 \ c2 _ 2( 2 ^+ 1 )] i for which knowledge of /3 is required. Combining 
block thresholding with truncation, we show that the bound can be estimated accurately 
enough, such that the resulting estimator attains the optimal rates of convergence. 

Fix a resolution level j and group the indices into blocks of length/size [logn], which is 
defined as the smallest integer larger than logn. Denote the blocks by Bjj>, I = 1,2,... 
In general [log n] does not divide the number of coefficients and on every resolution level 
there will be one block with size < [logn]. Since there are of the order of 2 J coefficients, 
the number of blocks is < 2 3 / logn. For a constant 7 > 0, define 



Dj := {S : 3£ such that S C B j;i and ^ Yf jk > 7^^}, 

U,k)es 

Lj : = min 151, 

and Lj := 00 if Dj is empty. Thus, Lj is the smallest number of observations, all belonging 
to the same block, that have sum of squares at least the threshold 7^p- Let us also mention 
that Lj takes values in the set {1, . . . , [logn]}U{oo}. The estimator for the (j, k)-th wavelet 
coefficient is given by 



/ 7 log n 

dj,k ■= sign(Y jtk )(\Y jik \ A tj), with tj := J ^ _ ^ (3.1) 

using the obvious rules 1/0 = 00, l/oo = 0, and |x| A 00 = Notice that the amount of 
regularization increases as Lj grows. Pick a J such that 2 J x n and consider the truncated 
block thresholding estimator 

J 

fj-= ^Z d i^j,k- (3.2) 
i=-i k 
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Theorem 2. Given model (1.1) and an s-regular wavelet. For 7 > 7, t/ze estimator f 
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defined in (3.2) is simultaneously adaptive with respect to both the L 2 - and LP -risk, that 



is, for any < f3 < s and any < Q < 00, 

jhn sup „W(^+x) E/ [||A - + (j^-)^ /(SV3+1) E / [||/; - < 00, 

The estimation procedure and Theorem [2] carry over to the white noise model with arbitrary 
(but known) noise level < a < 00. In this case, we observe 

dY t = f{t)dt + an~ l l 2 dW u t G [0, 1]. (3.3) 

Given that a is known, we can compute / 7 based on the scaled observations (cj~ 1 y ji fc)j,fc 
which yields an estimator for cr~ 1 f. Multiplication of the resulting estimator with a gives 
finally an estimator for / satisfying the conclusion of Theorem [2] 

It is crucial to the procedure that the length of the blocks Bj^ is of order log n. It has been 
pointed out by Cai [2] that a block length (logn) s with s > 1 will lead to suboptimal rates 
even for the pointwise loss and s < 1 introduces necessarily a log-factor in the L 2 -rates. 

In (3.1), we set dj^ = siga(Yj t k)tj if \Yj t k\ > tj. Theorem [2] remains true for any choice with 



\dj,k\ — tj- Nevertheless, the truncation step (3.1) is necessary. To see this, consider for 
7 > the block thresholding estimator 

J 

fi,B ■= ^2 I{ Lj . <0O }^Y^fcV'.j> (3.4) 
j=-i k 

For the estimator / 7 b, all resolution levels are taken into account that contain a block Bj t e 
with k)eB ■ 1 Yfk — ^^TT- ^ n eac h °f these levels, the wavelet coefficients are estimated 
by Yj,k without additional truncation. One should notice that fy t g only differs from the 



truncated block thresholding (3.2) on resolution levels for which 1 < Lj < 00. However, 
as stated in the following theorem, this estimator is suboptimal by a (log n) 1// ^ 4/3+2 ^-factor 
implying that truncation is indeed necessary. 



Theorem 3. Given model (1.1) and an s-regular wavelet. For 7 > 7, let f~B be defined 



as in (3.4). For any < j3 < s and any < Q < 00, f~B does not achieve the adaptive 



rate with respect to the L°°-loss and 

sup E^WIU] > ( f ^)- /3/(2/3+1) (logn) 1 /(W). 

In order to avoid to many technicalities, Theorem[3]is proved for the specific block threshold- 



ing procedure (3.4) only. However, the result can be extended to many other thresholding 
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estimators. In particular, similar arguments hold if a soft thresholding procedure is applied. 
The proof of Theorem [3] is based on the following observation: Consider an / with wavelet 
coefficients \djk\ x 2~2( 2 ^ +1 ) for all j,k. Then, for the block thresholding estimator (3.4), 
we can find an integer J n = J n (P) with 2 Jn x n 1 /^ 2 ^ 1 ) such that djk = Yj^ f° r j < Jn 
and all k. By the extreme value behavior of the maximum of standard normal random 
variables there exists, with high probability, a k* such that ej n .k* > qVlog n f° r some q > 0. 
Consequently, 

ll^^lloc > ( r ^)- /3/(2/3+1) (logn)V(^). (3.5) 
log n 

By some refined analysis, we obtain that with high probability, this term is a lower bound 
of the L°°-loss, i.e. \\f y>B - f\\oo > \\Yj nj k*ipj n) k*\\oo- Taking expectation and using ( pT5| ), 
this gives the lower bound of Theorem [3j A complete proof can be found in the appendix. 



Next, let us describe the link between (2.2) and (3.1). For that, we need to study the 
behavior of the processes (Lj)j and (tj)j. Recall that / G S(/3, Q) implies \dj : k\ < 2~2( 2 ^ +1 ). 
For convenience assume further that for the remaining part of this section also \dj^\ > 
2~ |(2/3+l) _ Using the decomposition of / into the three regimes in (2.1) together with 
Lemma [2] and ( |5.2[ ) in the appendix, we obtain that for sufficiently large n, 

(Lj,tj) = (l,oo), for j with 2 j < 2 J " (regime (J)), 

(L^tj) x (2i( 2 / 3 + 1 )^,2-2( 2 / ? + 1 )), for j with 2 J " < TP < 2 J " (regime (II)), (3.6) 
(Lj,tj) = (00, 0), for j with 2 J ™ < 2^' (regime (III)), 

with probability larger 1 — 1/n. Therefore, the process (ij)j has two phase transitions. For 
resolution levels in regime (I), Lj = 1, so = 00 and = ij,fc, i.e. there is no additional 
truncation. On the critical resolution levels (regime (II)), we have the same truncation 
behavior as for the non-adaptive estimator / in (2.2), namely t» X 2~2"( 2 ^ +1 ) (cf. also 
the remark after Theorem [T]). Finally, for regime (III), containing the irrelevant resolution 
levels, we find Lj = 00 implying dj t k = 0. For numerical simulations of (Lj)j, see Section |4j 



The previous argument shows that the truncation values (tj)j have the right order in 
the sense that tj x 2 - 2( 2/3+1 ) for resolution levels in regime (//). Using (3.6) it is now 
straightforward to deduce that the estimator / achieves the adaptive rate with respect to 
L°°-risk. However, it is still possible that the bias introduced by the truncation leads to 
a suboptimal L 2 -rate. Whereas, as a first guess, one might believe that the truncation 
in (3.1) corrects for a few 'outliers' only, (3.6) implies that it indeed affects many of the 
estimated wavelet coefficients on the critical resolution levels in regime (II). To see why 
truncation does not influence the L 2 -rate let us refer to part (IV) in the proof of Theorem 
|2 For the bound of the L°°-risk any value tj oc ( n ^ g " 1 ) ) 1//2 would work but the control of 
the L 2 -bias motivates the particular form of tj in (3.1). 
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4 Numerical simulations 



In this section, we illustrate the truncated block thresholding estimator by a small Monte 
Carlo analysis. The main focus of this study is to investigate the performance of the 
estimator / 7 under changes of 7. The benchmark functions 'Bumps', Blocks', 'HeaviSine', 
and 'Doppler', usually investigated for wavelet thresholding, are not well-suited, as they 
have a few abrupt changes that dominate the L°°-risk. Instead, we study the procedure for 
the (smooth) function 

/ = \/2sin(27i"). (4.1) 



0.08 





Figure 1: L 2 - and L°°-risk of the estimator / 7 in dependence on 7. The different curves 
correspond to (n, a) = (2 16 ,0.1) (fine dash), (n,a) = (2 16 ,0.3) (coarse dash), and (n,a) = 
(2 10 ,0.1) (solid). 



Three different scenarios for sample size and noise level in model (3.3) are considered, 
namely (n, a) G {(2 10 , 0.1), (2 16 , 0.1), (2 16 , 0.3)}. For the implementation, we modified the 
estimator / 7 in the sense that on all blocks on which no L 2 -signal is detected, the wavelet 
coefficients are always estimated by zero. More precisely, dj k = if for the block B^i 
containing (J, k), Yl(j k)eBj t ^fk < 7^p- F° r convenience we used Haar wavelets and stud- 
ied the estimator / 7 for 7 G [3,15]. The results based on 1000 repetitions are displayed 
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in Figure lj The upper panel gives 7 1— > Ey 2 [||/ 7 — /|||] for the three different scenarios. 
We find that the L 2 -risk is minimal for 7 6 [4.5,6]. Notice that this is in accordance with 
Cai [2], who proposes to take 7 ~ 4.5052, or more precisely, the root of x — logx — 3 = 0. 
Furthermore, values for 7 that are too small inflate the risk. The lower panel of Figure [T] 
displays 7 h-> Ej[||/ 7 — /||oo]- From the plot we deduce that the L°°-risk is minimal if 7 lies 
between six and eight, which is larger than the optimal value for L 2 -risk. Choosing 7 < 6 
might lead to a much bigger L°°-risk. 

To summarize, looking at the L 2 - and L°°-risk separately, we would like to pick different 
7's. However, for simultaneous estimation, we have to determine one value. For both risk 
functions it is less severe to choose 7 too large. Therefore, setting 7 = 7 seems reasonable 
for practical use. 

In Section [3j we have studied the behavior of 

the process {Lj);. In Figure [2] its distribution for ^^^^^^^^h 



the function (4.1) with n = 2 10 and a = 0.1 
is depicted. For every pair (j, k) the probabil- 
ity P(Lj = k) is estimated based on 10.000 rep- 
etitions. In the figure, these probabilities are 
converted into gray levels. Whereas the light M 
gray/white parts correspond to unlikely /never oc- 
curring events the coloring becomes darker as the 
probability increases and a value close to one is 
displayed by an almost black rectangle. For n = 
2 10 the block length is seven and therefore the pro- 
cess (Lj)j takes values on the set {1, . . . , 7}U{oo}. 

Our findings are in accordance with the behavior Figure 2: Distribution of (L 




3)3' 



described in (3.6). On low resolution levels the 

process is one. For j = 6, we obtain that Lj is likely to be two or three and in some 
cases also L7 is non-trivial in the sense that 1 < Lj < 00. For j > 7, Lj = 00 with high 
probability. 

Finally, let us comment on the fast computability of the estimator. A naive approach to 
compute {Lj)o<j<j requires 0(n 2 / log n) steps. By sorting the squared observations on ev- 
ery block first (using a fast algorithm this takes O(lognloglogn) comparisons), the process 
(Lj)o<j<j can be effectively computed in 0(n log log n)-time. Therefore, the main cost for 
the implementation comes from the discrete wavelet transform, requiring 0(n log n) steps. 
To conclude, similar as hard and block thresholding, the proposed estimator is computable 
in nearly linear time with respect to the sample size and is therefore also suitable for very 
large data sets. 
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5 Proofs 

Since it will be always clear to which probability distribution we are referring to, we write 
E = E f and P = P/. 

Define T as the event on which for all j < J and any block Bji, 

e| jjfc < 6.95 log n. (5.1) 

Notice that in total there are n/logn blocks, and thus, by the lemma below, 

mi < T^n- 2 = . (5.2) 

log n n log n 

Lemma 1. Let £l be chi-squared distributed with L degrees of freedom, i.e. £l ~ %\. V 
L < m and Q > e, then, 

P(£l >Qm) <exp(f (1-Q + logQ)). 
In particular, for m = [logn] and n sufficiently large, P(£l > 6.95 log n) < n~ 2 . 

Proof. Since £l is the sum of L squared standard normal distributed random variables it 
is sufficient to consider L = m only. Notice that for \t\ < 1/2, Ee^ L = (1 - 2t)~ m l 2 . Thus, 
with t = |(1 - Q- 1 ), P(£ L > Qm) < (1 - 2ty m ' 2 e- Qmt = exp(f (1 -Q + logQ)). □ 



Lemma 2. Let 7 > 7. Then, on the event T as defined in ( j5.ll ), for any block Bj : i with 
j < J, and any subset S C Bjj, 

logn . ^— v 2 logn 



and 



V d 2 k <10~ 5 -^- implies V Y 2 k < 7- 
z — ' J ' n z — ' ■'' n 

(j,k)eS (j,k)eS 



E 2 logn o logn 
rfj-fc > 4 7 implies > y fc > 7 . 

(j,k)es (j,k)es 



Proof. The first implication follows immediately from Yj k < WOldj k + e - k and the 

definition of T ■ For the second use \d 2 k < Y 2 k + \^ jk - □ 
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Proof of Theorem [7J Observe that for j < J, 

\dj, k - d jtk \ < \Y jtk - d jtk \ A 2c2"i( 2/3+1 ). 

L 2 -rate: Uniformly over / G Q(j3,Q), 

Jn(l3) OO 

e[ii/-/iiI] = E E E fe-^) 2 ] + E E4* 

j'=-i k i=J»C9)+i fc 

Jn(/3) 

j=-i 

= OCn-^- 7 "^) + n -W +1 >) = 0(n- 2 ^^ +1 )). 

L°°-rate: Let J n (/3) be an integer sequence such that 2 ,Jn ^"> x (n/logn) 1 ^ 2 ^ 1 ). The loss 
11/ — /II oo can be bounded by a constant multiple of 

OO 

E 2J,/2 max l d i,fe - dj.fcl- 

^— -" ' A; 



Define 



5 := {(ej,fc) : max | | < ^10 log n} and notice that P(S C ) < n 4 . (5-3) 



Then, uniformly over / S 0(/3, Q), 



Jn(0) 



2J'/ 



2 



E 2 J/2 max \d jjk - d j)k \ < E V m f X |£j ' fc| + S 2^ 2 2c2-K 2 ^ +1 ) 

<E m -i^i^+°(fe)"" /(2/J+1) )- 

Taking expectation together with 

max \e jt k \ < E l e j,fcl < \ E (^2 e l* + ™ 2 ) ( 5 - 4 ) 

fc k 

completes the proof. □ 

Proof of Theorem^ Recall that / G @(/3,Q) implies |dj,fc| < c2~2( 2 ^ +1 ) for some finite 
constant c and all j > —1, k. Hence, there are integers J n {j3) < J n (P) < </ such that 



log n 



2 ^(/3) < ( " ) i/(2^+D and sup ^ < 
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as well as 

2 MP) < „i/(2/J+i) and sup max \ djtk \ < (io^)- 1 / 2 . 

fee(/3,Q) 3>Jn(p), k 

All the estimates in this proof will be uniform over / 6 Q(/3,Q). In particular, < means 
smaller or equal up to a constant multiple that only depends on /3, Q, and the underlying 
wavelet. 

L 2 -rate: Note that 

J oo 

iE[ii7 7 -/iil]=]E[EEfe-^) 2 + £ £4*] 

i=-l fc j=J+l k 

J 

= E [ £ £& - ^,fc) 2 ] + (5.5) 

For the main term 

j j MP) 

£ £fe - d ^ - £ £(4* - d i,k) 2l T- + £ E^'^rniL^oo} 
j=— i i=— i fe j=— i 

Jn(/3) Jn(/3) 

+ 2 E £C^'> fe _ ^. fc ) 2 + 2 E £(*! + 4fc) I Tn{i<L J <oo}n{|y J , fc |>f J } 

j=— 1 A: J=— 1 ft 

J 

+ E £(^\* - d 3,kf^T 

j = J„(/3) + l ft 

=: (J) + (//) + (III) + (IV) + (V). 
In the following we will bound the expectation of the five terms (I) — (V), separately. 



(I): Recall (5.3). Since 



(d j>k ~ d j>k ) 2 < 2d£ fc + 2dl k < U\ k + U\ t 



and |e 2 j, < 1 + J?e*,ft' we find by Lemma [I] 

J J A 

E t E Efe - d i>*) V] < e[ E E( 6 4^ + - e i*xw= + is-)] 



n 

j=— i ft i= — i ft 



2 



< (logn)P(T c ) + E[ £ J>d 2 fc + 1 + Za^fe)^ 6 



j=-i k 

2 



< (logn)P(T c ) + nP(S c ) + E[ £ £ -^J*] = Ofa" 1 )- 



j=-i ft 
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(II): If Lj = oo then for every block on the resolution level j, we must have Yl(j k)&B e Yfk < 
7^p, i.e. by negation of the second statement in LemmajiJ fc)e_B 3 e^jk ^ ^7^p- Since 
on resolution level j, there are < 2 J /logn blocks, 



JnC8) 

Z - n -2/3/ (2/3+1) _ 



(//) - E n - 

j=— i fe i= — i & 

(LV^: Fix a j with Lj > 1. For every block J3 3 -^ denote by Vj^ the set of observations Y^k 
with |l}fc| > tj. Suppose that there is an ^ such that |Vj^| > Lj — 1. Then, for every subset 
R C % with |22| = Lj - 1, 



2 logn 

L = 7 . 

J n 
(i,fc)eJi 



and this is in contradiction to the definition of Lj. This shows that \Vj/\ < Lj — 1. Now, 
suppose there exists an I such that Yl{j k)&Vj e ^"jk — ^I^TT- Then, on T, by Lemmajijalso 
J2(jk)eV je — l^TT wmc h because of \Vjj\ < Lj — 1 is again a contradiction with the 
construction of Lj. Hence, for all t, J2(j k)ev-t ^jfc < 47^p an< ^ 



E(* 2 + * E E (*? + 4.) < E < 



3:' 

logn 2 J 

n n 



Summation over j < J n (/3) shows that (IV) < n 2 ^/( 2 ^ +1 ). 

(V): By Lemma [2] and construction of J n (P) we find that for j > J n (/3) and any £, 
Zl(j,fe)eSj £ *j\fc < on 7" and so Lj = oo, i.e. dj t k = 0. This shows 



J 

2/3/(2/3+1) 



oo < E £4*s* 

j = Jn(P) + l k 



Thus, from the estimates (I) - {V) and ( pTSj ), we find E[||/ 7 - < rT^A 2 ^ 1 ) and this 
completes the proof for the L 2 -rate. 

L°°-rate: The L°°-norm of the estimator / 7 can be bounded by a constant multiple of 

J oo 

2 J / 2 max \dj t k — dj^l + 2 3 ' 2 max|e£j ( &|. 
i i " j=J+l 
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The second term is of the (negligible) order n @ and thus it remains to show that 



i=-i 



n x-/3/(2/3+l) 



Consider 



J J Jn(J3) 

2 j/2 max|d iifc -d i>fc | = 2 j/2 ™x\d jtk -d j , k \I T c + Yl ^ 

max \dj : t — dj,k\^T 

j=-i j=-i " j=-i 

MP) J 

+ ^2 2J//2 max 

\d j>k -d j , k \i T + Yl 2j/2 

max \dj t k — dj,k\&T 

j=j n (P)+l j=M0)+i 
= : (i) + (ii) + (Hi) + (iv). (5.6) 

In the following the expectation of the four summands on the r.h.s. are bounded separately. 



(i): Recall (5.3). Using Ij-c < IsnT c + % c and (5.4), we obtain 



J J 



E [ E ^ 2 m^\d j>k -d jtk \I T c] < (7loi^)P(r c )+E[ J2 EC^+^M 
j=-i j=-i fc 

J 1 

< (7loi^)P(r c )+n 3 P(5 c ) + E[ £ ^_ e 2 fc ] =0(n -i). 



i=-i k 



(ii): Let < j < J n (P) be arbitrary and assume that T holds. We need to consider 
Lj 1 and Lj = 1, separately. If Lj ^ 1, then max/; |<Zj ( &| < \J 47^^ by Lemma J^J This 
guarantees 



2^ 2 max|d iife - ^, fe |I rn{i . /1} < 2^ 2 (tj + |4 fc |)I T n{L,^i} < 2 j/2 



logn 



k ' ■" ' "'l-J^-J - ... i-jt—j - y n 

If Lj : = 1, then 



2 J '/ 2 max|4fc - d j>k \I Tn{L . =1} < ^l 2 ^ 1 / 2 max |e Afc |I r < 2 i/2 



7 logn 



Since j was arbitrary, 



fm): Fix an arbitrary j with J n (@) < j < L n (/3). By construction of J n (P) we cannot have 
= 1 on 7". Thus, 1 < Lj < 00. Suppose that 

L 3 < io-V 2 2^ +1 )^ A [logn]. 
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Then, on T, for any subset S C {(j, k) : k} of cardinality | *S' | = Lj we have 



d\ k < L 3 c 2 2~^ +l ) < 10 



_ 5 logn 



and k)esYjh ^ 7~^n^- This is a contradiction to the definition of Lj. Therefore, we 
must have 

10 -5 c -2 2 ,(2 /3+ l)l^ <L .< oo 

n 

and thus for the error on the j-th resolution level 

2 i/2 max|d J - i fc - dj^Tn{L 3 ^i} < 2 j/2 (ij + max |d,',fc|) I rn{Lj>l} 



<2 j/2 (V^f +max|<i,, fc |)l rn{ij>1} 
< 2"^. 



Since j was arbitrary in J n (f3) < j < J n (P), 

MP) 

Togn 



-)< E 2 -^<(-^)-^ +1 ). 



j=J„{/3)+l 

(iv): We have Lj = oo, i.e. dj t k = 0, for all j > Jn(/3) on 7". This is a consequence of the 
construction of Lj and J n (/3) as well as Lemma [2j Therefore, 

(iv) < £ 2,72 max I I £ r^ /(2/m) . 

i = Jn(/?) + l 



Together, (i) — (iv) and (5.6) complete the proof for the L°°-rate. □ 



Proof of Theorem^ Fix arbitrary numbers < /3 < s, < Q < oo, and an s-regular 
wavelet tp. Recall that on every resolution level the coefficients are grouped in blocks of 
length [log n] . Since [log n\ does in general not divide the number of coefficients on a level, 
there might be one block of smaller length. Let Bj be the union of all blocks on resolution 



level j with length exactly [logn]. Using the definition of a Holder ball, i.e. (1.2), there is 
a constant c, not depending on j, such that for any j > 0, the function 

fi' £c2 eeC8,Q). 
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Choose an integer J n = J n (/3), such that 



n V(2/m)< 2 J„<(^ n )V(2/m) 



and consider f° := fj n S @((3,Q). By Lemma^we find, 

sup E[||/ 7iB - /||oo] > E[[|/ 7)B - /°||ooIr] = -4e[[| V ej^j^Ulr] (5.7) 



For a function 5, suppg will be its support. Let I C Bj n be such that |/| > 2 Jn and for 
every k±,k2 G / with fci ^ it follows that supp-f^j^/ci n supp ifjj n ± 2 = Such a set 
always exists due to the compact support of the wavelet function. Define k* by 

k* G argmax e Jnjk 

and consider the sets 7Z := {e j n ^* > \/log |/|} and U(k*) := {k : k ^ k* and supp^j^^ n 
supp^j ?l) fc* 7^ 0}. It follows from the extreme value behavior of standard normal random 
variables (cf. Embrechts et al. |5], p. 145) that F(1Z C ) < 1/2 for sufficiently large n. For 
sufficiently small 5 > 0, there exists a random sequence x n G (0, 1) such that 

mf\2- J ^^ Jntk *(x n )\ >5>0. 

n 

By triangle inequality, 
E[||^e Jnifc Vj„,fc||ooIIr] >E[|e Jnifc ^ Jn>fc *(x n )|W] -E[| ]T ej„,^J„,fc(^)|] • (5.8) 

fe k&U{k*) 

To bound the second term, notice that U{k*) and I are disjoint by construction. Since k* 
is a function of {tj,k : A; G 1} it is independent of ej nj fc for all G U(k*). The same holds 
for x n , Using E[-] = E[E[-|fc*, x n ]] it follows 

E[l E eJn*1>Jn*(*n)\] <2 J " /2 ||^||ool[ E < 2 J " /2 - (5.9) 

k£U(k*) k£U(k*) 



In order to find a lower bound of the first term in (5.8), observe that P(7~n TV) > 1 



P(T C ) - P(7£ c ) > 1/4 for sufficiently large n, since P(T C ) -> and P(ft c ) < 1/2. Therefore, 



Together with (5.7), (5.8), and (5.9) the result follows since by definition of /, J n , 



log n 



□ 
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